Mathematical Modelling of the Pattern of Occurrence of Words in Different Corpora of the Hindi Language
نویسندگان
چکیده
The incidence of different components of language in natural language texts are not arbitrarily organized but tend to obey particular laws which enable us to explain characteristic features of human language. The present paper is an attempt to analyse and model the pattern of occurrence of words in the Hindi language. Various kinds of corpora have been selected from different sources for the study, and the occurrence of words in these corpora has been observed for a variety of properties such as: frequencies, vocabulary measures, and pattern of initials of words relative to the subsequent matra.
منابع مشابه
Dictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application
The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...
متن کاملVocabulary Lists for EAP and Conversation Students
Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Comparative Analysis of Lexical Bundles in Journalistic Writing in English and Persian: A Contrastive Linguistic Perspective
This paper investigates the use of ‘lexical bundles’ in two broad corpora of journalistic writing. The aim of this study is to compare the use of lexical bundles in the two domains, one consisted of newspaper articles written in English and published in England and the other one comprised of newspaper articles written in Persian from Iranian publications. For this purpose, the frequency...
متن کاملThe Impact of Different Frequency Patterns on the Syntactic Production of a 6-year-old EFL Home Learner: A Case Study
This longitudinal study investigated the impact of different Frequency Patterns (FP) on the syntactic production of a six-year-old EFL learner in a home context. Target syntactic constructions were presented using games and plays and were traced for their occurrence patterns in input and output. Following each instruction period, the constructions were measured through immediate and delayed ora...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Quantitative Linguistics
دوره 20 شماره
صفحات -
تاریخ انتشار 2013